NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Memory Management in Complex Join Queries: A Re-evaluation Study

https://doi.org/10.1145/3698038.3698565

Jahangiri, Shiva; Carey, Michael J; Freytag, Johann-Christoph (November 2024, ACM)

Efficient multi-join query processing is crucial but remains a complex, ongoing challenge for high-performance data management systems (DBMSs). This paper studies the impact of different memory distribution techniques among join operators on different classes of multi-join query plans under different assumptions regarding memory availability and storage devices such as HDD and SSD on Amazon Web Services (AWS). We re-evaluate the results of one of the early impactful studies from the 1990s that was originally done using a simulator for the Gamma database system. The main goal of our study is to scientifically re-evaluate and build upon previous studies whose results have become the basis for the design of past and modern database systems, and to provide a solid foundation for understanding basic "join physics", which is essential for eventually designing a resource-based scheduler for concurrent complex workloads.
more » « less
Full Text Available
Towards a Memory-Adaptive Hybrid Hash Join Design

https://doi.org/10.1109/BigData59044.2023.10386098

Siviero, Giulliano_Silva Zanotti; Jahangiri, Shiva (December 2023, 2023 IEEE International Conference on Big Data (BigData))

In database management systems (DBMSs) that handle multiple concurrent queries, adapting to fluctuating workloads is crucial. This flexibility allows the DBMS to revise decisions based on current workload and available resources. As memory availability changes with the arrival or completion of queries, having memory-intensive operators like the Hybrid Hash Join that dynamically adapt is vital. This paper introduces a new memory-adaptive Hash-Based join algorithm design implemented in Apache AsterixDB and evaluates its responsiveness to memory variability.
more » « less
Full Text Available
Wisconsin Benchmark Data Generator: To JSON and Beyond

https://doi.org/10.1145/3448016.3450577

Jahangiri, Shiva (June 2021, Proceedings of the 2021 ACM SIGMOD Conference)
null (Ed.)
Full Text Available
Re-evaluating the Performance Trade-offs for Hash-Based Multi-Join Queries

https://doi.org/10.1145/3318464.3384406

Jahangiri, Shiva (May 2020, Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (SIGMOD’20))

As one of the most common and expensive database management system operators, join plays an important role in the query response time and/or throughput of the system. Although the processing and performance evaluation of multi-join queries has been the topic of research for the past decades [8, 12, 13], the complexity and multi-dimensional nature of the problem makes it an unsolved problem for the database community. Our work studies the performance of different classes of query plans, memory distributions for join operators, intra- query concurrency under different assumptions of memory availability, and storage devices such as HDD and SSD. This provides the foundation for understanding basic “join physics”, which is useful for designing a resource- based query scheduler for concurrent workloads. We use AsterixDB [1] utilizing both HDD and SSD, to re-evaluate the results of one of the early impactful studies from the 1990s [12] that was originally done using a simulator for the Gamma database system [4].
more » « less
Full Text Available
Robust and efficient memory management in Apache AsterixDB

https://doi.org/10.1002/spe.2799

Kim, Taewoo; Behm, Alexander; Blow, Michael; Borkar, Vinayak; Bu, Yingyi; Carey, Michael J.; Hubail, Murtadha; Jahangiri, Shiva; Jia, Jianfeng; Li, Chen; et al (February 2020, Software: Practice and Experience)

Summary Traditional relational database systems handle data by dividing their memory into sections such as a buffer cache and working memory, assigning a memory budget to each section to efficiently manage a limited amount of overall memory. They also assign memory budgets to memory‐intensive operators such as sorts and joins and control the allocation of memory to these operators; each memory‐intensive operator attempts to maximize its memory usage to reduce disk I/O cost. Implementing such memory‐intensive operators requires a careful design and application of appropriate algorithms that properly utilize memory. Today's Big Data management systems need the ability to handle large amounts of data similarly, as it is unrealistic to assume that truly big data will fit into memory. In this article, we share our memory management experiences in Apache AsterixDB, an open‐source Big Data management software platform that scales out horizontally on shared‐nothing commodity computing clusters. We describe the implementation of AsterixDB's memory‐intensive operators and their designs related to memory management. We also discuss memory management at the global (cluster) level. We conducted an experimental study using several synthetic and real datasets to explore the impact of this work. We believe that future Big Data management system builders can benefit from these experiences.
more » « less

Search for: All records